AITopics | access restriction

Collaborating Authors

access restriction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

4b6e5dae3acb4cfdfe5928a6eff174ee-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-12-2026, 15:42:44 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)
Instructional Material (0.67)

Industry:

Media > Film (1.00)
Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Add feedback

HEMM: Holistic Evaluation of Multimodal Foundation Models Paul Pu Liang

Neural Information Processing SystemsOct-10-2025, 01:38:54 GMT

To address this need, we contribute Holistic Evaluation of Multimodal Models ( HEMM), visualized in Figure 1. HEMM, as an evaluation framework, goes beyond conventional lists of datasets to emphasize holistic benchmarking at three levels.

access restriction, arxiv preprint arxiv, dataset, (13 more...)

Neural Information Processing Systems

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)
Instructional Material (0.67)

Industry:

Media > Film (1.00)
Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
(5 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(6 more...)

Add feedback

MINT: Multimodal Instruction Tuning with Multimodal Interaction Grouping

Shan, Xiaojun, Cao, Qi, Han, Xing, Yu, Haofei, Liang, Paul Pu

arXiv.org Artificial IntelligenceJun-10-2025

Recent advances in multimodal foundation models have achieved state-of-the-art performance across a range of tasks. These breakthroughs are largely driven by new pre-training paradigms that leverage large-scale, unlabeled multimodal data, followed by instruction fine-tuning on curated labeled datasets and high-quality prompts. While there is growing interest in scaling instruction fine-tuning to ever-larger datasets in both quantity and scale, our findings reveal that simply increasing the number of instruction-tuning tasks does not consistently yield better performance. Instead, we observe that grouping tasks by the common interactions across modalities, such as discovering redundant shared information, prioritizing modality selection with unique information, or requiring synergistic fusion to discover new information from both modalities, encourages the models to learn transferrable skills within a group while suppressing interference from mismatched tasks. To this end, we introduce MINT, a simple yet surprisingly effective task-grouping strategy based on the type of multimodal interaction. We demonstrate that the proposed method greatly outperforms existing task grouping baselines for multimodal instruction tuning, striking an effective balance between generalization and specialization.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.02308

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Energy (0.68)
(3 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

CLIMB: Data Foundations for Large Scale Multimodal Clinical Foundation Models

Dai, Wei, Chen, Peilin, Lu, Malinda, Li, Daniel, Wei, Haowen, Cui, Hejie, Liang, Paul Pu

arXiv.org Artificial IntelligenceMar-20-2025

Recent advances in clinical AI have enabled remarkable progress across many clinical domains. However, existing benchmarks and models are primarily limited to a small set of modalities and tasks, which hinders the development of large-scale multimodal methods that can make holistic assessments of patient health and well-being. To bridge this gap, we introduce Clinical Large-Scale Integrative Multimodal Benchmark (CLIMB), a comprehensive clinical benchmark unifying diverse clinical data across imaging, language, temporal, and graph modalities. CLIMB comprises 4.51 million patient samples totaling 19.01 terabytes distributed across 2D imaging, 3D video, time series, graphs, and multimodal data. Through extensive empirical evaluation, we demonstrate that multitask pretraining significantly improves performance on understudied domains, achieving up to 29% improvement in ultrasound and 23% in ECG analysis over single-task learning. Pretraining on CLIMB also effectively improves models' generalization capability to new tasks, and strong unimodal encoder performance translates well to multimodal performance when paired with task-appropriate fusion strategies. Our findings provide a foundation for new architecture designs and pretraining strategies to advance clinical AI research. Code is released at https://github.com/DDVD233/climb.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.07667

Country:

South America > Brazil (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)
(23 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)
(7 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
(6 more...)

Add feedback

HEMM: Holistic Evaluation of Multimodal Foundation Models

Liang, Paul Pu, Goindani, Akshay, Chafekar, Talha, Mathur, Leena, Yu, Haofei, Salakhutdinov, Ruslan, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceJul-3-2024

Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e.g., basic skills, information flows, and use cases) that pose challenges to today's models, and (2) distill performance trends regarding how different modeling dimensions (e.g., scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction tuning yield actionable insights for future work in multimodal foundation models.

access restriction, arxiv preprint arxiv, dataset, (13 more...)

arXiv.org Artificial Intelligence

2407.03418

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Film (1.00)
Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback